Picture for Wentao Bao

Wentao Bao

Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction

Add code
Apr 10, 2025
Viaarxiv icon

Window Token Concatenation for Efficient Visual Large Language Models

Add code
Apr 05, 2025
Viaarxiv icon

Visual Large Language Models for Generalized and Specialized Applications

Add code
Jan 06, 2025
Figure 1 for Visual Large Language Models for Generalized and Specialized Applications
Figure 2 for Visual Large Language Models for Generalized and Specialized Applications
Figure 3 for Visual Large Language Models for Generalized and Specialized Applications
Figure 4 for Visual Large Language Models for Generalized and Specialized Applications
Viaarxiv icon

Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection

Add code
Nov 17, 2024
Figure 1 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 2 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 3 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Figure 4 for Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
Viaarxiv icon

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Add code
Sep 22, 2024
Figure 1 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 2 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 3 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Figure 4 for Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Viaarxiv icon

MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos

Add code
Sep 04, 2024
Figure 1 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Figure 2 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Figure 3 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Figure 4 for MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos
Viaarxiv icon

Facial Affective Behavior Analysis with Instruction Tuning

Add code
Apr 07, 2024
Viaarxiv icon

Latent Space Energy-based Model for Fine-grained Open Set Recognition

Add code
Sep 19, 2023
Viaarxiv icon

On Model Explanations with Transferable Neural Pathways

Add code
Sep 18, 2023
Viaarxiv icon

Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting

Add code
Jul 17, 2023
Figure 1 for Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Figure 2 for Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Figure 3 for Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Figure 4 for Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting
Viaarxiv icon